Slovak Dataset for Multilingual Question Answering

نویسندگان

چکیده

SK-QuAD is the first manually annotated dataset of questions and answers in Slovak. It consists more than 91k factual from various fields. Each question has an answer marked corresponding paragraph. also contains negative examples form "unanswered questions" "plausible answers". The published free charge for scientific use. We aim to contribute creation Slovak or multilingual systems generating a natural language. paper provides overview existing datasets answering. describes annotation process statistically analyzes created content. expands possibilities training evaluation language models. Experiments show that achieves state-of-the-art results improves answering other languages zero-shot learning. compare effect machine-translated data with annotated. Additional improve modeling low-resourced languages.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning to Translate for Multilingual Question Answering

In multilingual question answering, either the question needs to be translated into the document language, or vice versa. In addition to direction, there are multiple methods to perform the translation, four of which we explore in this paper: word-based, 10-best, contextbased, and grammar-based. We build a feature for each combination of translation direction and method, and train a model that ...

متن کامل

Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question Answering

In this paper, we present the mQA model, which is able to answer questions about the content of an image. The answer can be a sentence, a phrase or a single word. Our model contains four components: a Long Short-Term Memory (LSTM) to extract the question representation, a Convolutional Neural Network (CNN) to extract the visual representation, an LSTM for storing the linguistic context in an an...

متن کامل

SQuAD Question Answering Dataset: CS224N Assn 4

We solve the contextual question answering problem, which is an essential part in many automated question-answering datasets. Recently the SQuAD dataset [1] was uploaded and there were several deep learning approaches proposed to solve this. We implement a modified version of one of them, the Dynamic Coattention model as well as simple baseline.

متن کامل

Question Answering on the SQuAD Dataset

We develop a deep learning framework for question answering on the Stanford Question Answering Dataset (SQuAD), blending ideas from existing state-of-theart models to achieve results that surpass the original logistic regression baselines. Using a dynamic coattention encoder and an LSTM decoder, we achieved an F1 score of 55.9% on the hidden SQuAD test set. In this paper, we present the methodo...

متن کامل

Multilingual Question/Answering: the DIOGENE System

This paper presents the DIOGENE question/answering system developed at ITCIrst. The system is based on a rather standard architecture which includes three components for question processing, search and answer extraction. Linguistic processing strongly relies on MULTIWORDNET, an extended version of the English WORDNET. The system has been designed to address two promising directions: multilingua...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Access

سال: 2023

ISSN: ['2169-3536']

DOI: https://doi.org/10.1109/access.2023.3262308